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ABSTRACT 


Recent progress in artificial intelligence (AI) has 
renewed interest in building systems that learn 
and think like people. Many advances have come 
from using deep neural networks trained end-to 
end in tasks such as object recognition, computer 
games, achieving performance nearly equal to 
human. Object recognition systems are pretty 
good in case of facebook and google but cannot 
recognize like human. Machines are trained 
thousands and thousands times but cannot 
recognize and predict the new or other possible 
instances of object. Human can recognize the 
object by looking at the back and side view only 
but it’s difficult in case of machine. Human can 
recognize and understand situation from speech or 
voice that he hears. Human can predict and feel 
various pose and action of an object. Human can 
recognize the situation by looking face of people. 
A person with bright face is believed as a happy 
situation and weeping as sad situation. This paper 
focuses on how machine also can take right 
decision based on these image, pose or action, 
voice and sound like human and other animals 
does. 
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1. INTRODUCTION 


In general we are least aware of what our minds 
do best." -Marvin Minsky. 


The human visual system represents a very 
complex and important part of brain activity, 
occupying about 30% of the cortex resources. It 
enables us to see colors, detect motion, and 
perceive dimensions and distance. It enables us to 
solve a very wide range of problems such as 
image segmentation, object tracking, as well as 
object and activity recognition. We can predict 
the future nature of activity from some pose or 
action. That is to say human mind is deterministic 
in character which strengthens capacity of 
learning supervised or unsupervised with few 
moment resulting response and take right decision 
very quickly. For example if someone is carrying 
a gun pointing to other person, our brain 
recognizes so quickly that we are in danger and 
tries to save life by doing certain tasks like hiding, 
running, counterstriking or other diplomatic 
activities —apologizing ,surrendering etc. We can 
take right decision at a right time. Human mind is 
emotional according to situation. Also human 
mind has tremendous capacity of image 
recognizing. We can recognize object by back, 
side or top view. That is to say if we can see back 
portion of someone but not the face but also we 
can recognize that person very accurately. If we 
know or see that two children are fighting, our 
brain takes decision much more quickly that we 
have to stop these children as soon as possible. 


Human needs unity among groups to perform a 
task in easier way. The decision of collectiveness 
is also taken as per necessary only. Human (we) 
perform these types of tasks very easily, not 
knowing the entire complexity what action is 
responsible in our brain to solve it. 


The above paragraph describes about what 
activities makes human an intelligent and social 
animal that is natural intelligence. My study and 
research is about why machine cannot be like 
human in intelligence. 


1.1 Current trends 

Although there are transcend improvements and 
research in Artificial Intelligence. Today machine 
can recognize face of people, recognize simple 
pose and actions, building and places, can identify 
human palm or fingerprints, heart-beat , retina etc. 
Companies such as Google and Facebook have 
active research divisions exploring AI 
technologies, and object and speech recognition 
systems based on deep learning (A neural network 
with at least one hidden layer) have been 
deployed in core products on smart phones and 
the web. The media has also covered many of the 
recent achievements of neural networks, often 
expressing the view that neural networks have 
achieved this recent success by virtue of their 
brain-like computation and thus their ability to 
emulate human learning and human cognition[1]. 
Google has a driverless car which is also 
significant achievement in Artificial Intelligence 
but as of August 28, 2014 the latest prototype has 
not been tested in heavy rain or snow due to 
safety concerns. Because the cars rely primarily 
on pre-programmed route data, they do not obey 
temporary traffic lights and, in some situations, 
revert to a slower "extra cautious” mode in 
complex unmapped intersections. The vehicle has 
difficulty identifying when objects, such as trash 
and light debris, are harmless, causing the vehicle 
to veer unnecessarily. Additionally, the lidar 
technology cannot spot some potholes or discern 
when humans, such as a police officer, are 
signaling the car to stop [2]. 


To sum up these achievements machine can do 
everything but cannot take self decisions because 
taking decision is a product of recognizing the 
image of an object or action, identifying and 
prediction of new or possible instances of object 
recognized as image, think for the possible 
decision and take right decision. To think 
possibilities for the right and positive decision 
machine may have emotion or self thinking 
capacity which may be based on supervised or 


unsupervised learning. So, this paper focuses on 
these topics to make machine intelligent as human 
and act like human. 


2. BASIC DEFINITIONS 


° Machine Learning- Machine learning is a 
type of artificial intelligence (AI) that 
provides computers with the ability to learn 
without being explicitly programmed. 
Machine learning focuses on the 
development of computer programs that can 
teach themselves to grow and change when 
exposed to new data. 

° Image Processing- Image processing is a 
method to convert an image into digital 
form and perform some operations on it, in 
order to get an enhanced image or to extract 
some useful information from it. It is a type 
of signal dispensation in which input is 
image, like video frame or photograph and 
output may be image or characteristics 
associated with that image. 

° Deep Learning- A neural network with at 
least one hidden layer (some networks have 
dozens). Most state-of-the-art deep 
networks are trained using the 
backpropagation algorithm to gradually 
adjust their connection strengths 

e  Backpropagation- Gradient descent 
applied to training a deep neural network. 
The gradient of the objective function (e.g., 
classification error or log-likelihood) with 
respect to the model parameters (e.g., 
connection weights) is used to make a series 
of small adjustments to the parameters in a 
direction that improves the objective 
function 

e Convolutional network- A neural network 
that uses trainable filters instead of (or in 
addition to) fully-connected layers with 
independent weights. The same filter is 
applied at many locations across an image 
(or across a time series), leading to neural 
networks that are effectively larger but with 
local connectivity and fewer free 
parameters. 

e Markov Chain- A Markov chain is a 
sequence ofrandom variables with the 

Markov property that the probability of 


moving to next state depends only on the 
present state and not on the previous states. 

Hidden Markov Model (HMM)- The 
Hidden Markov Model is a finite set 
of states, each of which is associated with a 
(generally multidimensional) probability 
distribution. Transitions among the states are 
governed by a set of transition 
probabilities. In a particular state an outcome 
or observation can be generated, according to 
the associated probability distribution. It is 
only the outcome, not the state visible to an 
external observer and therefore states are 
“hidden” to the outside; hence the name 


Hidden Markov Model. 
n-gram model- An n-gram modelis a type 
of probabilistic language model for 


predicting the next item in such a sequence 
in the form of a (n — 1) order Markov model. 


3. HUMAN BRAIN AND 
ARTIFICIAL 
INTELLIGENCE 


Human Brain contains of a densely 
interconnected set of nerve cells, or basic 
information-processing units, called neurons. 
Nerve cells, or neurons, are long, thin cells with 
branching ends. In the cerebral cortex, which is 
where visual processing happens, each neuron 
has about 10,000 branches at each end. The 
human brain incorporates nearly 10 billion 
neurons and 60 trillion connections, synapses, 
between them. By using multiple neurons 
simultaneously, the brain can perform its 
functions much faster than the fastest computers 
in existence today. The figure 1 is the basic 
structure of human brain and neuron. 
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Fig1.Natural Neuron (Human Brain) 
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Fig 2.Artificial Neural Network (ANN) 


Signals are propagated from neuron to neuron 
by electrochemical reaction. Chemical 
substances are released from the synapses and 
enter the dendrite, raising or lowering the 
electrical potential of the cell body.When a 
potential reaches a threshold, an electrical pulse 
or action potential is sent down the axon The 
pulse spreads out along the branches of the 
axon, eventually reaching synapses and 
releasing transmitters into the bodies of other 
cells. Basic Artificial Neural Network is shown 
in figure 2. It consists of a number of very 
simple and highly interconnected processors 
called neurons.The neurons are connected by 
weighted links passing signals from one neuron 
to another. The output signal is transmitted 
through the neuron’s outgoing connection. The 
outgoing connection splits into a number of 


branches that transmit the same signal. The 
outgoing ranches terminate at the incoming 
connections of other neurons in the network. 


In human brain, each synapse has its own 
“weight,” a factor by which it multiplies the 
strength of an incoming signal. The signals 
crossing all 10,000 synapses are then added 
together in the body of the neuron. Patterns of 
stimulation and electrical activity change the 
weights of synapses over time, which is the 
mechanism by which habits and memories 
become ingrained. A key operation in the 
branch of mathematics known as linear algebra 
is the dot-product, which takes two sequences 
of numbers — or vectors — multiplies their 
elements together in an orderly way, and adds 
up the results to yield a single number. In the 
cortex, the output of a single neural circuit 
could thus be thought of as the dot-product of 
two 10,000-variable vectors. That’s a very large 
calculation that each neuron in the brain can do 
at a stroke [3]. This is the basic way how human 
brain operates. All images and objects from 
human eye is stored in a part of brain with 
categories and actions that we see. Categories 
that activate the same brain areas have similar 
colors. For example, humans are green, animals 
are yellow, vehicles are pink and violet and 
buildings are blue [4]. 
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Fig 3. Image of brain part and 
categorization of viewed object [5]. 


4. MACHINE 
INTELLIGENCE 
SOLUTIONS 


During design of system that think like human i.e 
to say that machine intelligence like human, there 
are various factors that should be taken into 
considerations. There are challenges to make 
machine intelligence as human. In this sub topic I 
am going to cover basic challenges and probable 
solutions. 


4.1 Image _ processing and 


recognition 
The basic purpose of this unit is to take image as 
input and process and recognize it properly. Basic 
image processing involves following steps: 
e Image Acquisition - In this 
step, the image is captured by 
a sensor (such as a 
monochrome or colorTV 
camera) and digitized, if the 
output of the camera or sensor 
is not already in digital form- 


atmospheric 
phahomenon 


an analog-to-digital converter 
(ADC) digitizes it. 

Image Enhancement- It is the 
process of manipulating an 
image so that the result is 
more suitable than the 
original for specific 
applications. | Enhancement 
techniques are so varied, and 
use so many different image 
processing approaches. 

Image Restoration- It is the 
improvement of appearance 
of image. 

Color Image Processing- It 
uses the color of the image to 
extract features of interest in 


an image. 
Wavelets- It is used in image 
data compression and 


pyramidal representation. 
Compression- It is for 

o Reducing the 
storage required to 
save an image. 

o Reducing the size 
of the image to 
transmit it ("JPEG 
Standard"), with 
suitable bandwidth 
required for 
transmission. 

Morphological Processing- 
These are the tools for 
extracting image' components 
that are useful in the 
representation and description 
of shape. 

Image Segmentation- In this 
process, computer tries to 
separate objects from the 
image background. 
Representation and 
Description- Representation 
makes a decision whether the 
data should be represented as 
a boundary or as a complete 
region. 


e Recognition and 
Interpretation- It is the 
process that assigns label to 
an object based on the 
information provided by its 
descriptors. 

e Knowledge base- It controls 
the interaction between 
modules.[6] 


This image processing system doesn’t holds well 
for irregular and hand written letter or characters. 
Hand written characters are challenges for exact 
recognition. Many algorithms receives good 
performance with large number of training data, 
including K-nearest neighbors (5% test error), 
support vector machines (about 1% test error), 
and convolutional neural networks (below 1% test 
error. The best results achieved using deep 
convolutional nets are very close to human-level 
performance at an error rate of 0.2%. Results 
applying convolutional nets challenging 
ImageNet object recognition have produced result 
nearer to human performance [7]. Also writing of 
characters by people may vary. For example some 
people may write extra horizontal cross bar on 
writing ‘7’, ‘Z? or ‘* for ‘x’ etc. Additional 
progress may be achieved by combining deep 
learning and probabilistic program induction to 
tackle even variegates versions of the Characters 
Challenge [8]. 


4.2 Speech recognition 

Speech recognition basic means talking to the 
computer and computer understands whatever we 
talk with computer. Using speech recognition to 
machine, it understands the speech or voice and 
performs as per speech. Figure 4 represents basic 
block diagram of speech recognizer. The system 
is based on continuous density Hidden Markov 
Models for acoustic modeling and on n-gram 
statistical language models. It consists of three 
main modules, segmentation, features extraction, 
and decoding. The core module is a speech 
decoder, which needs three data sources for its 


operation: acoustic models, 


woo (saem 


Vocab, 


Fig 4 Block Diagram of speech recognizer [9] 

The input of this system is raw speech or voice 
resulting desired recognition of voice or speech. 
The use of this system is for psychological 
intuitive of machine which may sense from the 
voice and act rightly for the right situation as 
human does. 


4.3 Action recognition 

The first thing of human intelligence is to identify 
the image and pose or action of the object. Every 
action is recognized as image and processed as 
likely. The figure 5 below is the system 
architecture for pose and action recognition. 
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Fig 5. System Architecture for Action 
Recognition [10] 
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This architecture consists of following 
components 


4.3.1 Inputs and outputs 
A sequence of video frame is taken as input which 
contains human activities. The output is action. 
The Pose Classifier takes input a single frame 
image and the pose label that describes a pose of a 
human in the given image as output .It has two 
layer: one is training layer and another is 
classifier layer. In the first stage of the training, a 
binary classifier is trained for every pair of poses. 
This produces N? classifiers from N poses. Based 
on experiment, each posei- example; pair gets a 
score equal to the number of binary classifiers 
that classified the example; as posei. The pose that 
has the highest score for a given example is 
picked as an estimate pose for that example. 
When all the examples are evaluated, an N x N 
matrix is computed where each cell (i; j) 
represents a number of hidden poses i (each 
example is an example of a particular “hidden” 
pose) that were estimated to be pose j. Based on 
the N x N matrix of results, we can compute a 
closeness matrix. The closeness matrix represents 
the measure of closeness between any two poses 
or more specifically how difficult it is for a 
classifier to distinguish between any two poses. 
The element of the closeness matrix is defined as 
follows: 
Cti; j] =li; iD Reis D+ RI; iE RD; i) 

j i 


where C is the closeness matrix, R the results 
matrix. The equation is symmetric for i and j, so 


CHi; j] = CG; il. 


Once the closeness matrix is computed, two poses 
are grouped -i and j that are closest to each other. 
Then consider that group as one pose, leaving 
with a total of N-1 poses. The binary classifiers 
between the other poses and this newly formed 
grouped have to be trained. For efficiency reasons 
they are trained using the fragments left from 
binary classifiers corresponding to poses i and j. 
The process is repeated until only two groups are 
left at which point a hierarchical binary group tree 
is naturally formed. Every non-leaf node in the 
tree represents a pose group and has associated 
with it a binary classifier that splits the group into 


two smaller groups. The hierarchical group tree 
represents an N-pose classifier. When a new 
image is ready for classification, it is passed 
through the nodes of the tree until it reaches the 
final pose. An alternative to this method would be 
to simply look at the results matrix and decide on 
the pose based on that. But hierarchical method is 
better one. After pose training layer, classifier 
layer then selects the right pose of the incoming 
image. The Model of Transition layer modifies 
the various levels and possible transition of the 
image. Hidden Markov Model (HMM) is used to 
model the pose transitions. Output of this system 
produces the pose sequence as action which is the 
error corrected data of input images. The 
Sequence Classifier produces a single action 
label. Action Recognition component is the core 
part of the whole system. It lakes input as video 
sequence and coordinate of people in video 
sequence, and produces action label with start and 
end of the action poses as output. Frame 
extraction, motion extraction, coordinate 
extraction of the input video sequence is 
performed. Finally, the Sequence Selector class 
selects sequence of frames to analyze for action. 
This system solves the particular problem of 
vision-based human-action recognition by simple 
actions such as running, walking, playing, kicking 
etc. 


5. INTUITIVE MACHINE 


Human are physically and psychologically 
intuitive. The first step to make machine as 
human is to make intuitive physically and 
psychologically. The second thing is machine 
learning. There are generally two types of 
learning namely supervised and unsupervised 
learning. Learning strategy is the key factor for 
the intelligence of machine. To produce machines 
that learn like humans and as fast as humans do, 
we may also have to build machines that learn 
what humans learn. We believe that adopting 
more compositional, causal forms of knowledge 
representation helps both humans and machines 
get the most from learning-to-learn. We want to 
emphasize more generally that we believe all of 
the core ingredients for learning rich models 
articulated compositionality, causality, and 
learning-to-learn — can be incorporated into deep 
learning systems, and that these ideas will only 


benefit from being integrated together. Each one 
on its own is valuable, but their synergies are 
even more valuable for building truly human-like 
machine learning systems [11]. 


6. EMOTIONAL MACHINE 


When we human get some good news and 
situation we become happy and for bad news and 
situation we become sad. The happiness and 
sadness of human can also clearly viewed from 
facial and other expression. When we scold badly 
to a person this person becomes sad or may be 
aggressive. This is what human behavior and 
emotional nature of human. Various critical 
decision must be based on emotion as well. The 
crucial factor to behave machine as human is to 
make machine emotional. Becoming an emotional 
is the key factor for the right decision or self- 
judgment. Machine also needs to some extent of 
emotional intelligence to act and behave like 
human. This paper is also focusing on this point. 
Its approach is to develop theory and applications 
are data driven For emotional machine- 

First observe which states naturally are 
communicated from people to computers, Then 
build and test models that can predict what is 
measured and reflected in the data. In one project, 
the building of a computerized learning 
companion, two of the key affective states found 
from the data are interest and bored- two states 
that are not on most theorists “basic emotions” 
list. However, discriminating these states is vital 
to the machine’s ability to adapt the pedagogy to 
help keep the learning experience going, and our 
model addresses them [12].Emotional Intelligence 
can be affected by geography, society, people, 
culture, science etc. Emotion of human can also 
be predicted by examining various natural 
phenomenons like heart beat, facial expression, 
sensing skin characters, eye brows contraction 
and relaxation, blood pressure, and various speech 
and voice pattern. There are vast research success 
in sensing and analyzing these types of human 
behavior which can be taken data for machine 
learning. But for machine that could express 
emotions might be represented by various 
physical parameters like bit rate of data, power 
consumption etc. If machine is angry it could be 
represented by generating various sounds or 
signals or alarms. 


7. LEARNING PROCEDURE 


The main difference between human and machine 
learning is that human can learn fast and with less 
data or hint but machine learns with numbers of 
training data. As explained in Section 5 (Intuitive 
Machine), compositionality, causality, and 
learning-to-learn plays vital role. 

Compositionality is the common technique that 
human has, to define an infinite number of new 
representations from available or primitive 
elements through the combination of primitive 
elements i.e from finite representations. In 
computer programming, primitive functions can 
be combined together to create new functions, and 
these new functions can be further combined to 
create even more complex functions. Productivity 
is at the core of compositionality: an infinite 
number of representations can be constructed 
from a finite set of primitives, just as the mind can 
think an infinite number of thoughts, utter or 
understand an infinite number of sentences, or 
learn new concepts from a seemingly infinite 
space of possibilities [13,14]. For example, the 
two-wheeled vehicle might be represented as two 
wheels connected by a platform, which provides 
the base for a post, which holds the handlebars, 
etc. Parts can themselves be composed of sub- 
parts, of part-whole relationships that can be used 
to construct conceptual representations [15]. Two 
wheels (finite elements) can be combined to form 
infinite elements by arrangements (combinations 
of wheels can construct motorcycle, bicycle, 
adding more wheels can construct cars, busses, 
trucks and so on ). Starting from a letter we can 
construct numerous words. This is what human 
does with compositionality. Deep neural network 
has limited compositionality. To capture the full 
extent of the mind’s compositionality, a model 
must include explicit representations of objects, 
identity, and relations — all while maintaining a 
notion of “coherence” when understanding new 
configurations[16]. Let us say as on former wheel 
example, compositionality and basic intelligence 
says that vehicles can be constructed from wheel 
and it can travel distance but compositionality is 
unaware of the situations. Compositionality 
doesn’t says to drive vehicles at road or safer 
place. Compositionality holds well even bicycle 


arranged from wheel is subjected to ride at oceans 
because it is unaware of situations. 

Causality is about notion of coherence and tells 
about the use of knowledge of outer world or 
scenarios to describe situations. For above 
example of vehicle, causality of machine 
improves like if there is no road then stop here. 
Deep neural network may provide essential 
causality to machine [17]. 

Learning-to-Learn is a simple and basic natural 
phenomenon. It is said that to be saint we have to 
learn a lot. Learning is only one way of building 
wisdom in human. Like human machine also 
needs to learn. It means as much as we train 
machine, that performs well. 


8. DECISION MAKING AND 
JUDGMENT 


Decision making is the significant part of machine 
intelligence. Right judgment is adopting right 
decision. Decision making is the process of 
selecting best option among the available or 
possible options. Different decision theories 
explain decision making process. 


In game theory, a decision problem can be 
modeled as a triple, 

For example: 

d = (Q, C, A) 

Where Q is a set of possible states of the nature, 
C is a set of consequences, and A is a set of 
actions, A E CÌ, 

If an action a € A is chosen, and the prevailing 
state is @EQ, then a certain consequence a(œ) € 
C can be obtained. Assuming a probability 
estimation and a utility function be defined for a 
given action a as p(a): A > Randu: C>R, 
respectively, a choice function based on the utility 
theory can be expressed as 

d={ a |Zuļla(o)p(a) = max(Eulx(o)|p(x))Ax€ 
A)} Q Q 


In Bayesian theory, the choice function is called a 
decision rule. A loss function, L, is adopted to 
evaluate the consequence of an action as follows: 
L:-QxA—R 


Where @ is a set of all possible states of nature, 
A is a set of actions, and Q x A denotes a 
Cartesian product of choice. 
Using the loss function for determining possible 
risks, a choice function for decision making can 
be derived as follows: 
d = { a|pL(o, a)] = min (p[L(o,2)]) 

xEA 
Where p[L(@, «)] is the expected probability of 
loss for action x on o E Q. 
The cognitive process of decision machine can be 
visualized as the following figure 6. 
On the basis of figure 6 decision making is 
performed. On the basis of the LRMB [19] and 
Object Attribute Relation (OAR) models 
developed in cognitive informatics [20, 21, 22, 
and 23] the cognitive process of decision making 
may be informally described by the following 
courses: 


1. To comprehend the decision making 
problem and to identify the decision 
goal in terms of Object (O) and its 
attributes (A). 

2. To search in the abstract layer of Long 
Term Memory (LTM) [24] for 
alternative solutions (A) and criteria for 
useful decision strategies (C). 

3. To quantify A and C and determine if 
the search should be go on. 

4. To build a set of decisions by using A 
and C as obtained in previous searches 

5. To select the preferred decision(s) on 
the basis of satisfaction of decision 
makers. 

6. To represent the decision(s) in a new 
sub-OAR model. 

7. . To memorize the sub-OAR model in 
LTM 
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Fig 6 The cognitive process of decision making 
[18] 

By this all decision can be made by machine. 
Decision tree could also be one for taking 
decision. 


9. CONCLUSION AND 
FURTHER WORKS 


Machine can think like human as per paper’s 
explanation. If we can improve on causality and 
training procedure machine intelligence may be 
remarkably increased. If we can train the machine 
with real world data continuously and long time 
machine learning-to-learn can be improved. If 
training of data will exact equal to raw physical 
data, intuitive physical property may be achieved. 
Simple problems of intuitive psychology are 
solved but it is not satisfactory. Deep neural 
network with intuitive psychology should be 
progressed. Human intelligence is natural but 
artificial intelligence with best performance can 
be achieved as Human Brain and Cognition 
performs. Cognitive Science’s transcend research 
can be applied at machine for real time machine 
intelligence. This is the gist thing that I tried to 
explain in this paper. This is not a far away that 
machine intelligence will be as equal or nearly 
equal to human intelligence. 
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